Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user defined metadata #5915

Merged
merged 10 commits into from
Jul 2, 2024
Merged

Conversation

criccomini
Copy link
Contributor

Which issue does this PR close?

Closes #4754

Rationale for this change

We would like to be able to read and write user-defined metadata for AWS, GCP, and Azure. This use-case is mentioned briefly in #4754 before the conversation shifts to tags. It appears that tags were implemented, but user-defined metadata has not yet been.

What changes are included in this PR?

  • Add user-defined metadata for AWS, GCP, and Azure clients.

Are there any user-facing changes?

I've added a Metadata(String) attribute to the Attribute enum. Other than that I don't believe there are any user-facing changes.

I do not believe there are any breaking changes. The user-facing change (Attributes::Metadata(String)) is additive and should be compatible. The one caveat here is I'm not sure how Rust handles enums (open vs. closed). If it handles them the way Java does (i.e. adding new enums can break old code) then this would be a breaking change. I'll leave it to the Rust experts to clarify this.

@github-actions github-actions bot added the object-store Object Store Interface label Jun 18, 2024
@criccomini
Copy link
Contributor Author

There are a few implementation details I'd like to call out:

  1. The reqwest header name handling returns all custom headers as lower case. So if a user sets a Metadata("FOO"), the results returned in GetResult will have Metadata("foo"). This is somewhat unintuitive.
  2. I elected to make USER_DEFINED_METADATA_HEADER_PREFIX a const rather than static because it's referenced in certain const spots. Rust didn't like static variables referenced from const.
  3. I'm not thrilled about the get.rs code I added, but I couldn't figure out a better way. We need to iterate over all headers with the user-defined prefix (cloud-specific) and add them to attributes. I'm open to alternative approaches.
  4. I did not add WebDav (http) support. The client simply ignores Metadata attributes on put() right now.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me overall!

Some missing works could be tracking in follow up:

  • Add custom metadata support in multipart upload
  • Add custom metadata integration test
  • Track the current progress of webdav(http)
  • Local fs might support custom metadata by xattr

Some design decision like naming need reviews from @tustvold and @alamb.

@@ -61,6 +61,7 @@ use std::sync::Arc;

const VERSION_HEADER: &str = "x-amz-version-id";
const SHA256_CHECKSUM: &str = "x-amz-checksum-sha256";
const USER_DEFINED_METADATA_HEADER_PREFIX: &str = "x-amz-meta-";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe METADATA_HEADER_PREFIX is clear enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to rename to whatever. I chose to be quite verbose because I feel there's a lot of confusion about "metadata". You can see this play out in #4754, where the conversation muddles user-defined metadata and tags. Add in standard (non-user-defined) metadata and it gets even more murky.

Y'all let me know what to rename to, though. :)

object_store/src/client/get.rs Show resolved Hide resolved
object_store/src/client/get.rs Outdated Show resolved Hide resolved
@@ -183,6 +183,8 @@ impl Client {
has_content_type = true;
builder.header(CONTENT_TYPE, v.as_ref())
}
// Ignore metadata attributes
Attribute::Metadata(_) => builder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebDAV handles custom metadata in prop, which differs from other object storage services. We can create an issue to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -642,6 +647,7 @@ impl GetClient for S3Client {
etag_required: false,
last_modified_required: false,
version_header: Some(VERSION_HEADER),
user_defined_metadata_prefix: Some(USER_DEFINED_METADATA_HEADER_PREFIX),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about using metadata_header_prefix?

object_store/src/aws/client.rs Show resolved Hide resolved
@Xuanwo
Copy link
Member

Xuanwo commented Jun 19, 2024

The one caveat here is I'm not sure how Rust handles enums (open vs. closed).

Rust handles this by non_exhaustive attribute.

@criccomini
Copy link
Contributor Author

criccomini commented Jun 19, 2024

Thanks @Xuanwo. I've updated the PR with your suggestions. I haven't renamed the variables yet since it sounds like @alamb or @tustvold need to weigh in.

Regarding your note about "Add custom metadata integration test", I was hoping this change would test the new functionality. Is it not enough?

Regarding multipart upload, I think this is already handled. AWS's client has:

    pub async fn create_multipart(
        &self,
        location: &Path,
        opts: PutMultipartOpts,
    ) -> Result<MultipartId> {
        let response = self
            .request(Method::POST, location)
            .query(&[("uploads", "")])
            .with_encryption_headers()
            .with_attributes(opts.attributes)
            .with_tags(opts.tags)
            .idempotent(true)
            .send()
            .await?
            .bytes()
            .await
            .context(CreateMultipartResponseBodySnafu)?;

        let response: InitiateMultipartUploadResult =
            quick_xml::de::from_reader(response.reader()).context(InvalidMultipartResponseSnafu)?;

        Ok(response.upload_id)
    }

The opts.attributes in with_attributes is an Attributes, which now can contain Metadata(String). And with_attributes has been modified to set x-amz-meta- headers.

@alamb alamb requested a review from tustvold June 19, 2024 21:06
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THank you @criccomini and @Xuanwo for the review

Please ping me if you need another review. I think @tustvold may be back next week and he might get to reviewing this first

@criccomini
Copy link
Contributor Author

Sounds good! I'll check in next week. 😄

/// Specifies a user-defined metadata field for the object
///
/// The String is a user-defined key
Metadata(String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely a silliness that will make no measurable performance difference, but given these will often be static strings, I wonder what you think of making this Cow<'static, str>

if let Some(prefix) = T::HEADER_CONFIG.user_defined_metadata_prefix {
for (key, val) in response.headers() {
if let Some(suffix) = key.as_str().strip_prefix(prefix) {
if let Ok(val_str) = val.to_str() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should error here?

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits, nice to see the benefits of all the time spent DRYing up the client logic in making this a small change 😄

@criccomini
Copy link
Contributor Author

@tustvold I'm ready for another review here. I think I addressed all of your comments.

I had some issues with the attributes.insert() logic after I made the change. I ended up having to do .to_string().into() in order to disentangle the lifetimes (I think?). LMK if that's not the right approach here.

@criccomini
Copy link
Contributor Author

Gentle nudge here =)

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me thank, sorry for the delay

@tustvold tustvold merged commit cad5735 into apache:master Jul 2, 2024
13 checks passed
@criccomini criccomini deleted the add-user-defined-metadata branch July 2, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support User-Defined Object Metadata
4 participants